Unsupervised Translation Disambiguation for Cross-Domain Statistical Machine Translation
نویسندگان
چکیده
Most attempts at integrating word sense disambiguation with statistical machine translation have focused on supervised disambiguation approaches. These approaches are of limited use when the distribution of the test data differs strongly from that of the training data; however, word sense errors tend to be especially common under these conditions. In this paper we present different approaches to unsupervised word translation disambiguation and apply them to the problem of translating conversational speech under resourcepoor training conditions. Both human and automatic evaluation metrics demonstrate significant improvements resulting from our technique.
منابع مشابه
Contextual Modeling for Meeting Translation Using Unsupervised Word Sense Disambiguation
In this paper we investigate the challenges of applying statistical machine translation to meeting conversations, with a particular view towards analyzing the importance of modeling contextual factors such as the larger discourse context and topic/domain information on translation performance. We describe the collection of a small corpus of parallel meeting data, the development of a statistica...
متن کاملThe tÜBITAK-UEKAE statistical machine translation system for IWSLT 2009
We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses l...
متن کاملUnsupervised Resolution of Acronyms and Abbreviations in Nursing Notes Using Document-Level Context Models
Automatic simplification of clinical notes continues to be an important challenge for NLP systems. A frequent obstacle to developing more robust NLP systems for the clinical domain is the lack of annotated training data. This study investigates unsupervised techniques for one key aspect of medical text simplification, viz. the expansion and disambiguation of acronyms and abbreviations. Our appr...
متن کاملResolving Translation Ambiguity Using Non-Parallel Bilingual Corpora
This paper presents an unsupervised method for choosing the correct translation of a word in context. It learns disambiguation information from nonparallel bilinguM corpora (preferably in the same domain) free from tagging. Our method combines two existing unsupervised disambiguation algorithms: a word sense disambiguation algorithm based on distributional clustering and a translation disambigu...
متن کاملDisambiguating Temporal–Contrastive Discourse Connectives for Machine Translation
Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types of relations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in...
متن کامل